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Abstract 

The goal of this expository paper is to describe conditions which guarantee a central limit 
theorem for functionals of general state space Markov chains. This is done with a view towards 
Markov chain Monte Carlo settings and hence the focus is on the connections between drift and 
mixing conditions and their implications. In particular, we consider three commonly cited central 
limit theorems and discuss their relationship to classical results for mixing processes. Several 
motivating examples are given which range from toy one-dimensional settings to complicated 
settings encountered in Markov chain Monte Carlo. 

1 Introduction 

Let X = {Xi : i = 0,1,2,...} be a Harris ergodic Markov chain on a general space X with 
invariant probability distribution vr having support X. Let / be a Borel function and define fn := 
Y27=i fi-^i) and E^/ := f-^ f{x)TT{dx). When E^|/| < oo the ergodic theorem guarantees that 
fn Ejr/ with probability 1 as n — > oo. The main goal here is to describe conditions on X and / 
under which a central limit theorem (CLT) holds for /„; that is, 

Mfn-E^f)S-N{0,a}) (1) 

as n ^ oo where a'j := var^{/(Xo)} + 2 J2iZi cov7r{/(Xo), f{Xi)} < oo. Although all of the results 
presented in this paper hold in general, the primary motivation is found in Markov chain Monte 
Carlo (MCMC) settings where the existence of a CLT is an extremely important practical problem. 
Often vr is high dimensional or known only up to a normalizing constant but the value of Eyr/ is 
required. If X can be simulated then fn is a natural estimate of E^r/. The existence of a CLT 
then allows one to estimate a'j in order to decide if /„ is a good estimate of E^/. (Estimation of 
a'j is challenging and requires specialized techniques that will not be considered further here; see 
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Jones et al 



(200J) and ICeveil (11993) for an introduction.) T : 



Jones and Hobert (2001 



lus the existence of a CLT is crucial 
) for more on this point of view. 



to sensible implementation of MCMC; see 
The following simple example illustrates one of the situations common in MCMC settings. 

Example 1. Consider a simple hard-shell (also known as hard-core) model. Suppose X = {1, . . . ,ni}x 
{l,...,n2} CZ2. A proper configuration on X consists of coloring each point either black or white 
in such a way that no two adjacent points are white. Let X denote the set of all proper configurations 
on X, Nx{ni,n2) be the total number of proper configurations and vr be the uniform distribution 
on X so that each proper configuration is equally likely. Suppose our goal is to calculate the typical 
number of white points in a proper configuration; that is, if VF(x) is the number of white points in 
X gX then we want the value of 



w{x) 



Nx{ni,n2) ' 



If ni and 712 are even moderately large then we will have to resort to an approximation to EttW^. 
Consider the following Markov chain on X. Fix p £ (0, 1) and set Xq = xq where xq £ X is 
an arbitrary proper configuration. Randomly choose a point (x, y) £ X and independently draw 
U ~ Uniform(0, 1). liu < p and all of the adjacent points are black then color (x, y) white leaving all 
other points alone. Otherwise, color black and leave all other points alone. Call the resulting 

configuration Xi. Continuing in this fashion yields a Harris ergodic Markov chain {Xq, Xi,X2, ■ ■ .} 
having vr as its invariant distribution. It is now a simple matter to estimate E^VF with iUn- Also, 
since X is finite (albeit potentially large) it is well known that X will converge exponentially fast to 
vr which implies that a CLT holds for Wn- 



Following the publication of the infiuential book bv lMevn and Tweedid ((19931) the use of drift 
and minorization conditions has become a popular method for establishing the existence of a CLT. 
Indeed without this constructive methodology it is difficult to envision how one would deal with 
complicated situations encountered in MCMC. In turn, this has led much of the recent work on 
general state space Markov chains to focus on the implications of drift and minorization. Another 
outcome of t his approach is tha t cla ssical results in mixing process es have been somewhat neglected. 
For example, Nummelin (2002) and Roberts and Rosenthal (2004) recently provided nice reviews of 
Markov chain theory and its connection to MCMC. In particular, both articles contain a review of 
CLTs for Markov chains but neither contains any substantive discussion of the results from mixing 
processes. On the other hand, work on mixing processes rarely discusses their applicability to 
the impo rtant Markov c hain setting outside of the occasional discrete state space example. For 
example, iBradlevI (|l999l ) provided a recommended review of CLTs for m ixing processes but made 
no mention of their connections with Markov chains. Also. iRobertI (|l995l ) gave a brief discussion of 
the implication of mixing conditions for Markov chain CLTs but failed to connect them to the use 
of drift conditions. Thus one of the main goals of this article is to consider the connections between 
drift and minorization and mixing conditions and their implications for the CLT for general state 
space Markov chains. 
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2 Markov Chains and Examples 



Let P{x,dy) be a Markov transition kernel on a general space {X,B{X)) and write the associated 
discrete time Markov chain as X = {Xi : i = 0, 1, 2, . . . }. For n G N := {1, 2, 3, . . .}, let dy) 
denote the n-step Markov transition kernel corresponding to P. Then for i £ N, x £ X and a 
measurable set A, P^{x,A) = Pr S A\Xi = x). Let / : R ^ M be a Borel function and 

define Pf{x) := / f{y)P{x, dy) and A/(x) := Pf{x) — f{x). Always, X will be assumed to 
be Harris ergodic, tha.t is, ap eri odic, Vj-irreducibl e and positive Harris recurrent; for definitions 
see iMevn and Tweedie ( 1993 ) or Nummelin ( 1984 ) . These assumptions are more than enough to 



guarantee a strong form of convergence: for every initial probability measure A(-) on B{X) 



|P"(A,-)-^(-)ll 



as n 



oo 



where P^{\,A) := Jy^P'^{x, A)X{dx) and || • || is the total variation norm. Throughout we will be 
concerned with the rate of this convergence. Let M(x) be a nonnegative function and 7(n) be a 
nonnegative decreasing function on Z+ such that 



|P"fx, 



vr • 



< M(x)7(n) . 



(3) 



When X is geometrically ergodic Q holds with 7(n) = for some t < \. Uniform ergodicity 
means M is bounded and 7(n) = for some t < 1. Polynomial ergodicity of order m where m > 
corresponds to 7(n) = n~™. 

Establishing (jS)) directly may be difficult when X is a general space. However, some con- 
structive methods are giv en in the following brief discus sion: the interested reader should consult 
Jarner and Roberta (|2002l ) and lMevn and Tweedie jl993l 'l for a more complete mtroduction to these 
methods. 

A minorization condition holds on a set C if there exists a probability measure Q on B{X), a 
positive integer uq and an e > such that 



> €Q{A) WxgC , Ae BiX) . 



(4) 



In this case, C is said to be small. If (^J holds with C = X then X is uniformly ergodic and, as is 
well-known, 

||P"(x,-)-vr(-)|| <(l-e)L">°J . 
Uniformly ergodic Markov chains are rarely encountered in MCMC unless X is finite or bounded. 

Geometric ergodicity may be established via the following drift condition: Suppose that for a 
function F : X — > [1, oo) there exist constants d > 0, 6 < oo such that 



AV{x) < -dV{x) + bl{x eC) xeX 
where C is a small set and / is the usual indicator function. 



(5) 
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Polynomial ergodicity may be established via a slightly different drift condition: Suppose that 
for a function V : X ^ [1, oo) there exist constants d > 0, b < oo and < r < 1 such that 



^V{x) < -d[V{x)Y + bl{x £C) xeX 



(6) 



where C is a small set. Jarner and Robert; 



(2003) show that Q implies that X is polynomially 



ergodic of degree t/(1 — r). iDouc et alJ (|2004l ) have recently generalized this drift condition to other 
subgeometric (slower than geometric) rates of convergence. 

Remark 1. Either of the drif t conditions (151) or S imp lv that in (jH)) we can take Af(x) oc V{x). 

m shows that if P holds then E^V < oo. 



Moreover, Theorem 14.3.7 in 



1995 



, iMevn and Tweedie. , _ „. _ 

Since geometric ergodicity is equivalent to |Mevn and Tweedie , 1993[ ~Chapter 16) we conclude 



that geometrically (and uniformly) ergodic Markov chains satisfy (jHl) with Et^M < oo. On the other 
hand, the polynomial drift (jH)) only seems to imply that Et^V^ < oo where r < 1. Thus, when ^ 
holds, to ensure that E^^M < oo we will have to show that Ej^V < oo. 

Beyond establishing a rate of convergence, drift conditions also immediately imply the existence 
of a CLT for certain functions. 

Theorem 1. Let X be a Harris ergodic Markov chain on X having stationary distribution vr. 
Suppose / : X — > M and assume that one of the following conditions hold: 

1. The drift condition (jSJ holds and /^(x) < V{x) for all x € X. 

2. The drift condition ^ holds and |/(x)| < V{xY'^'^~^ for all x G X where 1 — T<r]<lis 
such that Et.V'^^ < oo. 

Then a'j G [0, oo) and if a'j > then for any initial distribution 

Mfn-E^f)^N{0,a)) 



as n 



oo. 



Remark 2. The first part of the theorem is fr om Mevn and Tweedid ()l993l . Theorem 17.0.1) while 
the second part is due to Jarner and Roberts ( 20021 . Theorem 4.21. 

Remark 3. iKontoviannis and Mevnl (|2003l l investigate the rate of convergence in the CLT when the 

is hing d rift and minoriza- 



drift condition ((SJ holds. 

There has been a substantial amount of effort devoted to estab" 



996) and iTierne 



vl hmi) 



( 2004 1. iFort and MoulinesI (200C) 



e xamin e d Gibbs samplers w 



lile 



20001 1 . 1.Tarner and EobertsI (l2002h . iMevn and Tweedie 



Fort and Moulines 



Christensen et al 
(I2OO3I 1. beveil (I1999I1 



(1993), and 



For example, 


Hobert and Gevei 




P998 


) 


Jones and Hobert 


Robert ( 


19951. 


Roberts and Poison 


Jl99 


4), 


[Rosenthal 


1 


1995], 



(l200lll. IDouc et al 



Jarner and Hansen 
Mengersen and Tweediel Jl99(: 
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considered Metropolis-Hastin gs-Green (MHG) algorithms. Also, iMira and TiernevI (|2002l 'l and 
Roberts and Rosenthal! (1999) worked with slice samplers. 



In the next section three simple examples are presented in order to give the reader a taste 
of using these results in specific models and to demonstrate the application of Theorem ^ More 
substantial examples will be considered in Section [HI 



2.1 Examples 



Example 1 continued. Since X is finite it is easy to see that @ holds with C = X and hence the 
Markov chain described in Example ^ is uniformly ergodic. Of course, if rii and n2 are reasonably 
large e may be too small to be useful. 

Example 2. Suppose X lives on X = Z such that if x > 1 and < ^ < 1 then 



P{x, x + l) = P{-x, -x-l) = e, P{x, 0) = P{-x, 0) = 1 
P(0,1) =P(0,-1) = ^ . 



This chain is Harris ergodic and has stationary distribution given by 7r(0) = (1 — 0)/{2 — 0) and for 
X > 1 



7r(x) 



ax— 1 



vr —X 



vr(O)- 



In Appendix^ the drift condition © is verified with V{x) = a'^' for a > 1 satisfying aO < 1 and 
(a^ - l)a + 1 - < and C = {0}. Hence a CLT holds for if f{x) < al^'l Vx G Z. 



Example 3. I.Tarner and Roberta ()2nf)2l ) and lTuomimem and Tweedid (|122J) consider the following 
example and establish a polynomial rate of convergence. Let X be a random walk on [0, oo) 
determined by 

where Wi , W2 , ... is a sequence of independent and identically distributed real- valued random vari- 
ables. As long as Ef l ^i ) < this chain will be Harris ergodic. When FiiW^)"^ < 00 for some m > 2 
Jarner and Roberts |20o3) 

establish the drift condition ® with V{x) = (x + 1)™, r = (m — l)/m 
and C = [0, k] for some A; < 00. Hence a CLT holds for /„ if |/(x)| < (x + for ah x > 

where 1 — T<r/<lis such that E7r(x + ^ qq Note that this moment condition also implies 

that E^y < 00 as long as r/ > 1/2. Hence by an earlier remark E^^M < 00 with M as in Q. 



Two things are clear: (i) drift and minorization provide powerful constructive tools for estab- 
lishing a rate of convergence in total variation; and (ii) they are less impressive (but often useful!) 
tools for establishing CLTs in that the results in Theorem ^ depend on the non- unique function V. 
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3 Mixing Sequences 



The goal of this section is to introduce three types of mixing conditions and discuss some of the 
connections with the total variation convergence in and ©• There are a variety of mixing 
conditions (e.g. absolute regularity) that will not be considered here since they don't seem to have 
much impact on the CLT. Roughly speaking, mixing conditions are all attempts to quantify the 
rate at which events in the distant future become independent of the past. 

Let Y := {Yn} denote a general sequence of random variables on a probability space {Q,J-,V) 
and let.Ff = a(n,...,y^). 

Definition 1. The sequence Y is said to be strongly mixing (or a-mixing) if a(n) — > as n — > oo 
where 

a(n) := sup sup \V{A f] B) - V{A)V{B)\ . 



Harris ergodic Markov chains are strongly mixing. Recall the coupling inequality (p. 12 Lindvalll . 



19921) : 

||P"(x,-)-vr(-)|| <Pr:.(r>n) (7) 

where T is the usual coupling time of two Markov chains; one started in stationarity and one started 
arbitrarily. Under our assumptions the coupling time is almost surely finite and Pr(T > n) — > as 
n ^ oo. Let A and B be Borel sets so that by ((T)) 

|P"(x, A) - 7r(y4)| < Pr^(r > n) 

and 



/ Pr^(r > n)TT{dx) > [ |P"(x,^) - 7r{A)\Tr{dx) 
JB Jb 

> I / [P"(x,A) -7r(^)]7r(dx) 
Jb 



= I Pr(X„ G A and Xq e B) - tt{A)tt{B)\ . 

Then a(n) < Fi-j^lPixiT > n)] and a dominated convergence argument shows that E7r[Pr2^(T > n)] 
as n ^ oo and hence a(n) ^ as n — > oo. Moreover, the rate of total variation convergence bounds 
the rate of a~mixing: if (j^j) holds with E^rM < oo, a similar argument shows that a{n) < j{n)'ETjM 
and hence a{n) = 0(7(71)). For example, geometrically ergodic Markov chains enjoy exponentially 
fast strong mixing. 

Suppose the process Y is strictly stationary and let / : M ^ M be a Borel function. Define the 
process W := {Wn = f{Yn)}. Set := a{Wk, Wm); hence C TJ!". Let aw and ay be the 
strong mixing coefficients for the processes W and Y, respectively. Then awin) < ayin). Similar 
comments apply to the mixing conditions given below. This elementary observation is fundamental 
to the proofs of the Markov chain CLTs considered in the sequel. 
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Definition 2. The sequence Y is said to be asymptotically uncorrelated (or p-mixing) if p{n) 
as n ^ oo where 

p{n) := sup{corr(C/, V), U e L^iT^) , V G L2(^,°^J k > 1}. 

It is standard that /o-mixing sequences are also strongly mixing and, in fact, 4a(n) < p{n). It is 
a consequence of the strong Markov p roperty that if a Harris ergodic Markov chain is p-mixing then 
it enjoys exponentially fast p-mixing (jBradlevl . Il98fil . Theorem 4.2) in the sense that there exists a 
e>0 such that p(n) = 0{e-^''). 



RosenblattI (jlOTlJ) develops a necessary and sufficient condition for a Markov chain to be p— 
mixing but before giving it a slight digression is required. Define the Hilbert space i^(vr) := {/ : 
X ^ R; Ett/^ < oo} with inner product {f,g) = ETr[f{x)g{x)] and norm || • ||2. Let io(^) {/ ^ 
L^(7r) ; E^/ = 0} and note that if /, 5 € Lq[tt) then {f,g) = cov^(/, g). The kernel P defines an 
operator T : L'^{tt) ^^(vr) via 



Pix,dy)fiy) . 



It is easy to show that T is a contraction (i.e., ||T|| < 1). Also, T is self-adjoint if and only if the 
kernel P satisfies detailed balance with respect to vr: 



Tr{dx)P{x, dy) = Tr{dy)P{y, dx) Vx, y G X . 



(8) 



Rosenblatt! (|l97ll . p. 207) shows that a Harris ergodic Markov chain is p-mixing if and only if 



lim 

n— >oo 



sup ||T"/||2 = 0. 



(9) 



There has been some work done on est ablishing sufficient conditions for Markov chains to be p- 
mixing. For example, iLiu et al.l (|199fj ) show that if the operator induced by a Gibbs sampler 
satisfies a Hilbert-Schmidt condition then it is p-mixing. However, the most interesting case is 
given bv lEoberts and Eosenthall (|l997r whose Theorem 2.1 shows that if X is geometrically ergodic 
and ® holds then there exists a c < 1 such that ||T/||2 < and ||r"/||2 = ||T/||^ hence ® holds. 
We conclude that if X is geometrically ergodic and Q holds then X is asymptotically uncorrelated. 

Remark 4. Many Markov chains satisfy (jSJ, indeed the MHG algorithm satisfies Q by construction. 
However, ((H| does not hold for those Markov chains associated with systematic scan Gibbs samplers 
and the Markov chain in Example |21 for example. 

Definition 3. The sequence Y is said to be uniformly mixing (or (/i-mixing) if (/>(n) — > as n ^ oo 
where 

(/>(n) := sup sup \V{B\A) - V{B)\ . 



Be:F' 
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Uniformly mixing sequences are also asymptotically uncorrelated and strongly mixing. More- 



over, p{n) < 2\/(j)(n ). A Harris ergodic Markov chain is uniformly ergodic if and only if it is 



uniformly mixing; see 



Ibragimov and Linnik 



(1971 



pp 367-368). 



As with asymptotically uncorrelated sequences it is a consequence of the strong Markov property 
that if a Harris ergodic Markov chain is (/)-mixing then it enjoys exponentially fast 0- mixing ()Bradlev , 
1986I . Theorem 4.2) in the sense that there exists a > such that 4>{n) = 0(e~^"). 

We collect and concisely state the main conclusions of this section. 
Theorem 2. Let X he a Harris ergodic Markov chain with stationary distribution vr. 



1. X is strongly mixing, i.e., a{n) — > 0. 

2. If ® holds with E^M < oo then a(n) = 0{j{n)). 

3. If X is geometrically ergodic and ((HJ holds then X is asymptotically uncorrelated, in which 
case there exists a 6 > such that p{n) = 0{e~^^). 

4. X is uniformly ergodic if and only if X is uniformly mixing, in which case there exists a 9 > 
such that (j){n) = 0(e-^"). 

4 Central Limit Theorems 

We begin with a characterization of the CLT for strongly mixing processes. Define Sn = Y17=i ^« 
and cT^ = ES^. 



Theorem 3. 



Coaburn.\l96A : 



Denker 



198a:\Mori and Yoshihara . \l98a) Let Y he a centered strictly 
stationary strongly mixing sequence such that EYq < 00. /fcr^— >oo as n—^oo then the following 
are equivalent: 

2. {S^/a'^ , n > 1} is uniformly integrable. 



Remark 5. Since Harris ergodic Markov chains are strongly mixing this result is applicable in MCMC 
settings. 

Remark 6. The assumption of stationarity is not an issue for Harris ergodic Marko v chains since if a 
CLT holds for any one initial distribution then it holds for every initial distribution (|Mevn and Tweedie 
I993I . Proposition 17.1.6). 



ChenI (|1999l ) provides the following characterization of the CLT. 
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Theorem 4. \Chen. \ 19991) Let X be a Harris ergodic Markov chain and f be a function such that 



Et^/ = and E-j^f"^ < oo. Then the following are equivalent: 

1- \pn]n ^(0, (J^) for some > 0. 

2. {\/nfn , n > 1} is bounded in probability. 



Remark 7. ICheiJ (|l99fll ) also provides another equivalent condition in terms of quantities based on 



the so-called split chain. But this is not germane to the current discussion. 
4.1 Sufficient Conditions 



Theorem 5. Ubraaimoi\ . \l962i : \lbraaimov and LinniJi.\l97A ) Let Y be a centered strictly stationary 



strongly mixing sequence. Suppose at least one of the following conditions: 

1. There exists B < oo such that \Yn\ < B a.s. and ^^^n (^{^) < oo ; or 

2. i?|y„p+^ < oo for some (5 > and 



Then 



and if a"^ > 0, as n ^ oo, 



= E{Y^) + 2 ^ E{YoYj) < oo 



n 



Corollary 1. Let f : X ^ be a Borel function such that E^^] f (x)]"^^^ < oo for some 5 > and 
suppose X is a Harris ergodic Markov chain with stationary distribution vr. // © holds such that 
Et^M < oo and 7(n) satisfies 

^7(n)'/('+') <oo (11) 

n 

then for any initial distribution, as n ^ oo 

Mfn-E^f)S N{0,a}) . 

Later, CLTs for (/)-mixing and p-mixing Markov chains will be presented. However, the proofs 
of these results are similar to the proof of Corollary ^ Hence only the following proof is included. 

Proof. Let a{n) and af{n) denote the strong mixing coefficients for the Markov chain X = {X^} 
and the functional process respectively. By an earlier remark < a{n) for all n > 1. 
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Moreover, we have that a(n) < j{n)'ET^M where 'y{n) and M are given in Hence (jllj) guarantees 
that 



and the result follows from the Theorem and Remark 1^1 



□ 



Corollary ^ immediately yields some special cases which have proven to be useful in MCMC 
settings. 

Corollary 2. Suppose X is a Harris ergodic Markov chain with stationary distribution vr and let 
f : X ^ M be a Borel function. Assume one of the following conditions: 



1. 



Chan a,nd Geyen.\l99^ ) X is geometrically ergodic and E.,r\f{x)\'^~^^ < oo for some 6 > 0; 

2. X is polynomially ergodic of order m, E^^Ad < oo and E.,r\f{x)\'^^^ < cc where m6 > 2 + 6; or 

3. X is polynomially ergodic of order m > 1, E-^M < oo and there exists B < oo such that 
\f{x)\ < B TT-almost surely. 

Then for any initial distribution, as n ^ oo 

V^{fn-E^f)S N{0,a}) . 

For geometrically ergodic Markov chains the moment cond ition can not be w eakened to a second 
moment (i.e., E^/^(rE) < oo) without additional assumptions. iHaggstroml ((200J) has recently estab- 
lished the existence of a geometrically ergodic Markov chain an d a function / such that 'E^^f'^lx) < oo 
yet a CLT fails for any choice of cr^. Also, see BradlevI (1983) for a counterexample with the same 
conclusion. These results are not too surprising since there are non-trivial counterexamples that 
indicate that th e conditions of Theorem are nearly as good as can be expected. For example, 
Herndorl jlQSsh constructs a strictly stationary sequence of uncorrelated random variables, {1^}, 
that have an arbitrarily fast stro ng mixing rate a nd < EY^ < oo yet the CLT fails. Further coun- 
terexamples have been given bv iDavvdovl (|l973l l and iBradlevI (|l985|). However, a slightly weaker 
moment condition is available if the sequence enjoys at least exponentially fast strong mixing which 
is the ca se for geometrically e rgodic Markov chains. The following theorem is a special case of a 
result in 



Doukhan et al 



( 1994 ) 



Theorem 6. 



Doukhan et al 



199 A) Let Y be a centered strictly stationary strongly mixing se- 
quence. If the strong mixing coefficients satisfy a(n) = 0(a") for some < a < 1 and £'[yg^(log^ jlol) < 
oo then 



a"" = EY^ + 2Y,E{YoYk 



k=i 



converges absolutely and if cr'^ > 0, as n 



oo 



n 



-i/25.4iV(0,a2) 
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Corollary 3. Suppose X is a Harris ergodic Markov chain with stationary distribution vr and let 
f : X ^ be a Borel function. If X is geometrically ergodic and ETr[f'^{x){log'^ 1/(2^)1)] < ^ then 
for any initial distribution, as n ^ oo 

V^ifn-E^f)^ N{0,aj) . 
A weaker moment condition is available for p-mixing sequences. 



Theorem 7. JT^ 

EYq < oo. Suppose 



Then 



197 A ) Let Y be a centered strictly stationary p-mixing sequence with 

oo 

J^p(n)<oo. (12) 
n=l 



k=l 



converges absolutely and if > 0, as n ^ oo 



n 



Recall that if the Markov chain X is geometrically ergodic and satisfies detailed balance, it enjoys 
exponentially fast /9-mixing and hence H12|) obtains. 



Roberts and Rosenthal 



1992) Let X be a geometrically ergodic Markov chain with 



Corollary 4. 

stationary distribution vr. Suppose X satisfies (jSJ and that E^f'^{x) < oo. Then for any initial 
distribution, as n ^ oo 



V^{fn-E^f)S N{0,a}) . 



Remark 8. 



Roberts and Rosentha] ((199^ obtain this result via Corollary 1.5 in lKionis and Varadhan 



(198a). We have thus provided an alternative derivation. 



An accessible proof of the f ollowing resu lt may be found in lBillingslev (Il96£ ) and 
(igzJ)- Also see Chapter 5 of lpoobl (igsj and Lemma 3.3 in lcogburni il972 ). 



Ibragimov and Linnik 



Theorem 8. Let Y be a centered strictly stationary uniformly mixing sequence with EY^ <oo. If 



n < oo 



(13) 



then 



a^ = EYi + 2f2E{YoYk) 



k=l 



converges absolutely and if > then as n ^ oo 



n 



-i/25.4iV(0,a2). 
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If X is uniformly ergodic the coefficients (i){n) decrease exponentially and is obvious. 



Corollary 5. Ubraaimov and Linmli . \l97]\ : \Tierneu . \l99^) Let X he a uniformly ergodic Markov 



chain with stationary distribution tt. Suppose E.^f'^{x) < oo. Then for any initial distribution, as 
n — > oo 

V^{fn-E^f)S NiO,a}) . 

The main conclusions of this section can be concisely stated as follows. 

Theorem 9. Let X be a Harris ergodic Markov chain on X with invariant distribution vr and let 
/ : X — > M is a Borel function. Assume one of the following conditions: 

1. X is polynomially ergodic of order m > 1, E^M < oo and there exists B < oo such that 
\f{x)\ < B almost surely; 

2. X is polynomially ergodic of order m, E-^M < oo and ETr\f{x)\'^^^ < oo where m6 > 2 + 6; 

3. X is geometrically ergodic and E.,r\f(x)\'^~^^ < oo for some 6 > 0; 
4- X is geometrically ergodic and E^^lf'^ {x){log^ 1/(3^)1)] < c«; 

5. X is geometrically ergodic, satisfies and E^^f'^^x) < oo; or 

6. X is uniformly ergodic and E-j^f'^ix) < oo. 

Then for any initial distribution, as n ^ oo 

V^{fn-E^f)S N{0,a}) . 

Remark 9. Condition 1 of the theorem is interesting for applications of MCMC in Bayesian settings. 
In this case, it is often the case that posterior probabilities, i.e. expectations of indicator functions, 
are of interest. Since indicator functions are bounded it follows that a CLT will hold under a very 
weak mixing condition. 

5 Examples 

5.1 Toy Examples Revisited 

Example 1 continued. Recall that since X is finite this chain is uniformly ergodic and uniformly 
mixing. Hence Corollary [5] implies that a CLT will hold for if FiT^f'^{x) < oo which will hold 
except in unusual cases. 

Example 2 continued. This chain is geometrically ergodic but does not satisfy Q- Hence it is 
strongly mixing and we can not conclude that it is asymptotically uncorrelated. Thus the best we can 
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do is to appeal to Corollary|51and conclude that a CLT will hold for /„ if E^[/(x)^(log"'~ |/(a;)|)] < oo. 
Recall that in subsection 12.11 it was shown that a CLT holds for /„ if /^(x) < al^l Vx G Z when 
a > 1 satisfies aO < 1 and {a6 — l)a + 1 — 6 < 0. 

Example 3 continued. Let m > 2 and recall that this random walk is polynomially ergodic of 
order m — 1 and that Theorem ^ says a CLT holds if f{x) < (x + for all x > where 

1 — T < rj < lis such that E^(x + l)2nir; ^ Alternatively, an appeal to Corollary [21 says that we 
have a CLT if E^(x + 1)" < oo and E^\f{x)\'^+^ < oo where 6 > 2/(m - 2). 



5.2 A Benchmark Gibbs Sampler 



The following Gib bs sampler is similar to one use d by m any authors to a nalyze the benchmar k 
failure data set in 



R,osenthall (|l995l ). and 



Tiernev 



Gaver and O'Muircheartaighl (|l987l ). For example, iRobert and Casellal (|l999h . 



purn p 



(199J) consider similar settings and establish uniform ergodicity of 
the corresponding Gibbs samplers. 

Set y = (yi, 2/2, • • • , Vn)^ and let 7r(x, y) be a joint density on M""*"^ such that the corresponding 
full conditionals are 



X|y ~ Gamma (ai, a + b^y) 
l^|x ~ Gamma (a2i, /3i(x)) 

for i = 1, . . . ,n, b = {hi, . . . , where a > and each 6j > are known. (Say U ~ Gamma(a, (3) 
if its density is proportional to u°^~^e~^^I{u > 0).) Since, conditional on x, the order in which 
the Yi are updated is irrelevant we will use a two variable Gibbs sampler with the transition rule 
(x',y') — > (x,y); that is, given that the current value is (X„ = x' ,Yn = y') we obtain l^+i) 
by first drawing x ~ then l^^n+i ~ f{yi\x)- A minor modification of Tierney's argument 

will show that ^ holds on C = X with no = 1 and if for i = 1, . . . , n there is a function g > Q such 
that for all X > 



(14) 



hix + I3i{x) 

Thus if (|14() holds this Gibbs sampler is uniformly ergodic (or uniformly mixing) and an appeal to 
Corollary [51 shows that a CLT is assured if E^/^(x) < oo. 



5.3 A Gibbs Sampler for a Hierarchical Bayes Setting 

Consider the following Bayesian version of the classical normal theory one-way random effects 
model. First, conditional on 9 = {9i, . . . and Ae, the data, Yij, are independent with 

yi,|0,Ae~N(0,,A-i) 

where i = 1, . . . , K and j = 1, . . . , mj. Conditional on fi and Xg, 6i, ... ,6k and Ag are independent 
with 

Ag ~ N(/i, Ag ^) and Ag ~ Gamma(a2, 62)) 
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where 02 and 62 are known positive constants. Finally, fj, and Xg are assumed independent with 

/X ~ N(mo, Sq "'^) and ~ Gamma(ai, 5i) 

where mQ,SQ,ai and 61 are known constants; all of the priors are proper since So,ai and bi are 
assumed to be positive and ttiq G M. The posterior density of this hierarchical model is characterized 
by 

7r{e, fi, X\y) oc g{y\d, Xe)g{e\fi, Xe)g{Xe)g{MXe) (15) 

where A = {X0,Xe)'^, y is a vector containing all of the data, and g denotes a generic density. 
Expectations with respect to vr typically require evaluation of ratios of intractable integrals, which 
may have dimension K + 3 and typically, K >3. 

We are interested in the standard Gibbs sampler which leaves the posterior (|15j) invariant. Define 

K K 

vl{e,^^) = Y,{G^-n)\ v2{e) = Y,^iiGi-yi? and ssE = J2iyij-yi? 

i=l i=l i,j 

where = X]j=i Vij- ^^e full conditionals for the variance components are 

Xg\9,n,Xe,y ~ Gamma +ai,-!^y^ + bi^ (16) 

and 

Aelfc*,//, A0,y ~ Gamma (— + 02, ^ h 62 I (17) 



where M = mj. If 9-i = (^1, . . . , ^j+i, . . . , Ok) and 9 = K ^ J2i ^ii remaining full 



,iH. - vi/i,---,i/i-i,'^j+i,---,i/k''^ - sr 1 

conditionals are 

9i\9-i,^i,Xg,Xe,y ^ N 

for i = 1, . . . , ii' and 

m \ \ M somo + /s:Ae^ 1 

H9,Xg,Xe,y ~ N I 



Ag/u + mjXeyi 1 
Afl + mjAe ' Afl + m-jA, 



V so + i^Ae SQ + KXe, 
Our fixed-scan Gibbs sampler updates ^ then the 0i's then Xg and Ag. Since the 9iS are conditionally 
independent given {fJ.,X), the order in which they are updated is irrelevant. The same is true of 
Xg and Ae since these two random variables are conditionally independent given {9,^). A one-step 
transition of this Gibbs sampler is (/x', 0', A') — > (/x, 0, A) meaning that we sequentially draw from the 
conditional distributions /x| A', 9' then 9i\9-i, fj,, X' for i = 1, . . . , K then Ae|/x, ^ then A^l/x, 0. Assume 
that m' = min{mi,m2, . . . , w-x} > 2 and that K > 3. Let m" = max{mi,m2, . . . jTUk}- Define 
5i = (2ai +K - 2)^1 and 62 = (2ai - 2)-\ 

Proposition 1. \ Jones and Hoberi . 200A ) Assume that ai > 3/2 and 5m' > m" . Fix ci G 
(0, min{6i, 62}) . Then the Gibbs sampler satisfies © with the drift function 

V{9, A) = 1 + e^i^« + e^i^= + + (^ - yf . 

KdiXg So + KXg 



14 



Remark 10. iJones and Hoberti (|200J) give values for the constants in (jSJ but in an effort to keep 
the notation under control we do not report them here. 

Theorem ^ immediately implies a CLT for /„ for any function / such that f'^{fi,0,X) < V{6,X) 
for all {fj,,0,X). Of course, it is easy to find functions involving fj, or 9 that do not satisfy this 
requirement. On the other hand, Theorem ^ will be useful for many functions of only Xq and Ag. 



Recall that the drift f unction may not be unique. Prior to the work of lJones and Hoberti (|200J) , 
Robert and Geven (|1998l ) also analyzed this Gibbs sampler and established © using a different drift 
function and more restrictive conditions on ai and m' . However, this drift function can alleviate 
some of the difficulties with using Theorem ^ for functions involving /x. 



Proposition 2. hHohert and Geven.\l993 ) Assume that ai > {3K — 2)/{2K — 2) and m' > (\/5 ■ 
2)m" . Then the Gibbs sampler satisfies © with the drift function 



( 1 ^ 

W{ii,9,X) =1 + ,? _ + ^ 

\ ^ i=l 



+ 



""^^ -{O-yf 



So + KXe 



where < < 1 is a constant defined on p. J^21 in iHohert a,nd, Gey en i[W9H) . 



Proposition ^ shows that this Gibbs sampler is geometrically ergodic as long as ai > 3/2 and 
5m' > m" . However, it does not satisfy detailed balance. An appeal to Corollary [2 or 13] shows that 
functions with a little bit more than a second moment with respect to ()15() will enjoy a CLT. 



5.4 Independence Sampler 

The independence sampler is an important special case of the MHG algorithm. Suppose the target 
distribution vr has support X C M*^ and a density which, in a slight abuse of notation, is also denoted 
vr. Let p be a proposal density whose support contains X and suppose the current state of the chain 
is Xn = X. Draw y ^ p and set Xn+i = y with probability 

TT{y)p{x) 



a{x,y) 



TT{x)p{y) 



A 1 



otherwise set X^+i = x. This Markov chain is Harris ergodic and it is well-known (|Mengersen and Tweedie 
19961 ) that it is uniformly ergodic if there exists k > such that 



p{x) 



< K 



(18) 



for all X since ()18() implies a minorization @ on X with uq = 1 and e = 1/k. Hence Corollary [S] 
implies that a CLT will hold for /„ if E^r/^ < oo. On the other hand, the independence sampler 
will not even be geom etrically ergodic if there is a set of positive vr- measure where p8|) fails to hold. 



Moreover, in this case 



Roberts! ((19991) has given conditions which ensure a CLT cannot hold. 
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5.5 An MHG Algorithm for Finite Point Processes 



The material in this subsection is adapted from lGeveii ()199fll ) and lMGihen (|1999^ . Let A" be a bounded 
region of M"^ and let A be Lebesgue measure. Define := {0} and for k > 1, := X x ■ ■ ■ x X 
(there being k terms in the Cartesian product). Think of x G X^ as a pattern of k points in X , 
in particular, X^ denotes the pattern with no points, and define n{x) to be the cardinality of x so 
that if X S X^ then n{x) = k. Let the state space X be the union of all X^ , that is, X = U^qX, 
where Xj = {x : n{x) = i}. The target tt is an unnor malized density with respect to the Poisson 
process with intensity measure A on X. iGeveii (|l999l ) proposes the following MHG algorithm for 
simulating from tt: 



1. With probability 1/2 attempt an up step 

(a) Draw ^ ~ \[-)/\{X). Set x = x U ^ with probability 

\{X)TT{xVJi) 



1 A 



(n(x) + 1) 7r(x) 
2. Else attempt a down step 

(a) If X = skip the down step 

(b) Draw ^ uniformly from the points of x. Set x = x \ ^ with probability 

n{x)TT{x\i) 
\{X)^{x) ■ 

This MHG algorithm is Harris recurrent and geometrically ergodic. 



Proposition 3. \Geven. \199!\) Suppose there exists a real number M such that 

tt{x U < M7r(x) 

for all X X and all & X. Then the MHG algorithm started at x* G {x : 7r(x) > 0} is Harris 
ergodic and satisfies © with the drift function V{x) = where A > MX{X) V 1. 

Of course, Theorem ^ implies a CLT for /„ for any function / such that f ^(x) < A""^^^ for all 
X. On the other hand, this algorithm was constructed so as to satisfy Q fsee iGeveii ()l999l ) for a 
detailed argument) and hence the Markov chain is asymptotically uncorrelated so that a CLT holds 
when E,r/^(2;) < oo. 



5.6 Random Walk MHG Algorithms 

Let TT be a target density on M'^ and let the proposal density have the form q{y\x) = q{\y — x\). 
Now suppose that the current state of the chain is X„ = x. Draw y ^ q and set Xn+i = y with 
probability 

a{x,y) = ^ A 1 ; 
7r(x) 
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otherwise set Xn+i = x. Note that this algorithm satisfies (jSJ by construction. 

Random walk-type MHG algorithms are some of the most useful and po pular MCMC algorithms 
and co nsequently their theoretical properties have been thoroughly studied. iMengersen and Tweed 
(199g) show that random walk samplers (on M'^) cannot be uniformly ergodic (or uniformly mix- 
ing) but they do establish that a random walk MHG algorithm can be geometrically ergodic by 
verifying © when k = 1 and vr has tails that decrease exponentially. iRoberts and Tweedid (Il99fil'l 



3owever, 



Roberts and Tweedie 



Jarner and Hansen 



extend ed their work by establishing © in the case where k > 1. 
(2000|) verified © with a different drift function than that used by 
and obtained more general conditions ensuring geometric ergodicity. On the other hand, if a random 
wal k MHG algorithm i s not g eometrically ergodic it may still be polynomially ergodic of all orders; 
see IPort and Moulines ( 2000 ). 



Jarner and Hansen 



Proposition 4. 

uous first derivatives such that 



200C) Suppose -k is a positive density on R'^ having contin- 



X 

lim 1 — j- • V log 7r(j;) 

xl^oo \x\ 



-oo 



Let A[x) := G M*^ : 7r(y) > vr(x)} be the region of certain acceptance and assume that there exist 
5 > and e > such that, for every x, \x — y\ < 5 implies q{y\x) > e. Then if 



liminf / q{y\x)dy>0 

the random walk MHG algorithm satisfies © with the drift function V{x) 
c> 0. 



CTT 



(x) for some 



Hence, under the conditions of Proposition 0J Theorem ^ guarantees a CLT if f{x) satisfies 
/^(x) < C7r(x)~^/^ for all x G M'^. Alternatively, we conclude that the random walk MHG is 
geometrically ergodic, satisfies (jSl) (and hence is asymptotically uncorrelated) and an appeal to 
Corollary 13 establishes the existence of a CLT if E7r/^(x) < oo. 



6 Final Remarks 



The focus has been on some of the connections between recent work on general state space Markov 
chains and results from mixing processes and the implications for Markov chain CLTs. However, 
this article only scratches the surface of the mixing process literature that is potentially useful in 
MCMC. For example, the exist ence of a functional CLT or strong i nvariance principle is req uired in 



order to estimate a'j from ( Damerdiil . 1994 : Glvnii and Whitt 



1992; 



Jones 



etal 



20041 '!. There 



has been much work on these for mixing processes; iPhilipp and StoutI (|l975|) is a good starting 
place for strong invariance principles while iBillingslev (Il968h gives an introduction to the functional 
CLT. 
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A Calculations for Example [2] 

Define V{z) = al^l for some o > 1. Then V{z) > 1 for all z G Z and 

PV{x) = ^al^lp(x,y) . 

Recall that AV{x) = PV{x) - V{x). The first goal is to show that if x 7^ then AV{x)/V{x) < 
since then there must be a /? > such that AV{x) < —f3V{x). Suppose Xn = x > 1 then 

PV{x) = 0a^+^ + i-e ^ 4^ = ae-l + ^—^ < 

V[x) 

as long as 

{a9-l)V{x) + l-e <0 . (19) 

Now ()19p can hold only if a0 — 1 < and since V{x) > a for all a; 7^ ()19() will hold when aO — 1 < 
and {a9 — l)a + 1 — ^ < 0. A similar argument shows that this is also the case when Xn = —x < —1. 
Now suppose Xn = 0. Then PV{0) = a and AV{0) = -V{0) + a. Putting this together yields ® 
with C = {0}. 
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